Roles of Macro - Actions in Accelerating Reinforcement
نویسندگان
چکیده
We analyze the use of built-in policies, or macro-actions, as a form of domain knowledge that can improve the speed and scaling of reinforcement learning algorithms. Such macro-actions are often used in robotics, and macro-operators are also well-known as an aid to state-space search in AI systems. The macro-actions we consider are closed-loop policies with termination conditions. The macro-actions can be chosen at the same level as primitive actions. Macro-actions commit the learning agent to act in a particular, purposeful way for a sustained period of time. Overall, macro-actions may either accelerate or retard learning , depending on the appropriateness of the macro-actions to the particular task. We analyze their eeect in a simple example, breaking the acceleration eeect into two parts: 1) the eeect of the macro-action in changing exploratory behavior, independent of learning , and 2) the eeect of the macro-action on learning, independent of its eeect on behavior. In our example, both eeects are signiicant, but the latter appears to be larger. Finally, we provide a more complex gridworld illustration of how appropriately chosen macro-actions can accelerate overall learning. Many problems in artiicial intelligence (AI) are too large to be solved practically by searching the state-space using available primitive operators. By searching for the goal using only primitive operators, the AI system is bounded by both the depth and the breadth of the search tree. One way to overcome this diiculty is through macro-actions (or macros). By chunking together primitive actions into macro-actions, the eeective length of the solution is shortened. Both Korf, 1985] and Iba, 1989] have demonstrated that using macro-actions to search for a solution has resulted in solutions in cases where the system was unable to nd answers by searching in primitive state-space, and in nding faster solutions in cases where both systems could solve the problem. Reinforcement learning (RL) is a collection of methods for discovering near-optimal solutions to stochas-tic sequential decision problems Watkins, 1989]. An RL system interacts with the environment by executing actions and receiving rewards from the environment. Unlike supervised learning, RL does not rely on an outside teacher to specify the correct action for a given state. Instead, an RL system tries diierent actions and uses the feedback from the environment to determine a closed loop policy which maximizes reward. In this work, we treat macro-actions as closed-loop policies with termination conditions. Prior work that has included closed-loop macro-In …
منابع مشابه
Description and Acquirement of Macro-Actions in Reinforcement Learning
Reinforcement learning is a framing of enabling agents to learn from interaction with environments. It has focused generally on Markov decision process (MDP) domains, but a domain may be non-Markovian in the real world. In this paper, we develop a new description of macro-actions for non-Markov decision process (NMDP) domains in reinforcement learning. A macro-action is an action control struct...
متن کاملMacro - Actions in Reinforcement Learning : An EmpiricalAnalysisAmy McGovern and Richard
Several researchers have proposed reinforcement learning methods that obtain advantages in learning by using temporally extended actions, or macro-actions, but none has carefully analyzed what these advantages are. In this paper, we separate and analyze two advantages of using macro-actions in reinforcement learning: the eeect on exploratory behavior, independent of learning, and the eeect on t...
متن کاملMacro Actions in Reinforcement Learning An Empirical Analysis
Several researchers have proposed reinforcement learning methods that obtain ad vantages in learning by using temporally extended actions or macro actions but none has carefully analyzed what these advantages are In this paper we separate and an alyze two advantages of using macro actions in reinforcement learning the e ect on exploratory behavior independent of learning and the e ect on the sp...
متن کاملA Pilot Study on the Evolution of Reward Signals for Hierarchical Reinforcement Learning
Recent research has shown that reinforcement learning agents can by greatly advantaged of the possibility of learning to select macro actions instead, or beside, fine primitive actions. The route usually followed to exploit this idea is to build agents with hierarchical architectures that can learn both a repertoire of macro actions and a macro policy that selects them, on the basis of the “fin...
متن کاملPlanning with Closed-Loop Macro Actions
Planning and learning at multiple levels of tempo ral abstraction is a key problem for arti cial intelli gence In this paper we summarize an approach to this problem based on the mathematical framework of Markov decision processes and reinforcement learn ing Conventional model based reinforcement learning uses primitive actions that last one time step and that can be modeled independently of th...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1997